Search CORE

18 research outputs found

Recommended from our members

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Author: Butte Atul J
Fan Xuancheng
Glicksberg Benjamin S
Goldstein Theodore
Ludwig Dana
Muenzen Kathleen
Norgeot Beau
Oskotsky Boris
Peterson Thomas A
Rutenberg Eugenia
Schenk Gundolf
Schmajuk Gabriela
Sirota Marina
Yazdany Jinoos
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods

eScholarship - University of California

Image Alignment for Time-Series Analysis of Protein Crystallisation Trials, The University of Hamburg, diploma thesis

Author: Gundolf Schenk
Tilo Strutz
Victor Lamzin
Publication venue
Publication date
Field of study

The automatic analysis of time-series of crystallisation trials has still much room for several diploma theses or even PhD-projects. Here, we present a method developed for automatically tracking time-series images as are obtained during crystallisation trials. This work [Sch06] was supported by th

CiteSeerX

Protein sequence and structure alignments within one framework

Author: Andrew E. Torda
Gundolf Schenk
Thomas Margraf
Publication venue
Publication date: 01/01/2008
Field of study

CiteSeerX

Crossref

PubMed Central

doi:10.1093/nar/gkp431 The SALAMI protein structure search server

Author: Andrew E. Torda
Gundolf Schenk
Thomas Margraf
Publication venue
Publication date
Field of study

Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server at www.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs

CiteSeerX

Potential for measurement of the distribution of DNA folds in complex environments using Correlated X-ray Scattering

Author: Andrew Spakowitz
Brad Krajina
Gundolf Schenk
Sebastian Doniach
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Recommended from our members

Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care.

Author: Butte Atul J
Dudley R Adams
Luo Yanting
Mahendra Malini
Mills Hunter
Schenk Gundolf
Publication venue: eScholarship, University of California
Publication date: 01/06/2021
Field of study

To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU.DesignClinical note text was used to build machine learning models for adults admitted to the ICU. Preprocessing strategies studied were none (raw text), cleaning text, stemming, term frequency-inverse document frequency vectorization, and creation of n-grams. Model performance was assessed by the area under the receiver operating characteristic curve. Models were trained and internally validated on University of California San Francisco data using 10-fold cross validation. These models were then externally validated on Beth Israel Deaconess Medical Center data.SettingICUs at University of California San Francisco and Beth Israel Deaconess Medical Center.SubjectsTen thousand patients in the University of California San Francisco training and internal testing dataset and 27,058 patients in the external validation dataset, Beth Israel Deaconess Medical Center.InterventionsNone.Measurements and main resultsMortality rate at Beth Israel Deaconess Medical Center and University of California San Francisco was 10.9% and 7.4%, respectively. Data are presented as area under the receiver operating characteristic curve (95% CI) for models validated at University of California San Francisco and area under the receiver operating characteristic curve for models validated at Beth Israel Deaconess Medical Center. Models built and trained on University of California San Francisco data for the prediction of inhospital mortality improved from the raw note text model (AUROC, 0.84; CI, 0.80-0.89) to the term frequency-inverse document frequency model (AUROC, 0.89; CI, 0.85-0.94). When applying the models developed at University of California San Francisco to Beth Israel Deaconess Medical Center data, there was a similar increase in model performance from raw note text (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.72) to the term frequency-inverse document frequency model (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.83).ConclusionsDifferences in preprocessing strategies for note text impacted model discrimination. Completing a preprocessing pathway including cleaning, stemming, and term frequency-inverse document frequency vectorization resulted in the preprocessing strategy with the greatest improvement in model performance. Further study is needed, with particular emphasis on how to manage author implicit bias present in note text, before natural language processing algorithms are implemented in the clinical setting

eScholarship - University of California

Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care.

Author: Butte Atul J
Dudley R Adams
Luo Yanting
Mahendra Malini
Mills Hunter
Schenk Gundolf
Publication venue: eScholarship, University of California
Publication date: 01/06/2021
Field of study

PubMed Central

eScholarship - University of California

Recommended from our members

A certified de-identification system for all clinical text documents for information extraction at scale.

Author: Ashouri Choshali Habibeh
Butte Atul J
Israni Sharat
Muenzen Kathleen
Oskotsky Boris
Plunkett Thomas
Radhakrishnan Lakshmi
Schenk Gundolf
Publication venue: eScholarship, University of California
Publication date: 04/07/2023
Field of study

ObjectivesClinical notes are a veritable treasure trove of information on a patient's disease progression, medical history, and treatment plans, yet are locked in secured databases accessible for research only after extensive ethics review. Removing personally identifying and protected health information (PII/PHI) from the records can reduce the need for additional Institutional Review Boards (IRB) reviews. In this project, our goals were to: (1) develop a robust and scalable clinical text de-identification pipeline that is compliant with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule for de-identification standards and (2) share routinely updated de-identified clinical notes with researchers.Materials and methodsBuilding on our open-source de-identification software called Philter, we added features to: (1) make the algorithm and the de-identified data HIPAA compliant, which also implies type 2 error-free redaction, as certified via external audit; (2) reduce over-redaction errors; and (3) normalize and shift date PHI. We also established a streamlined de-identification pipeline using MongoDB to automatically extract clinical notes and provide truly de-identified notes to researchers with periodic monthly refreshes at our institution.ResultsTo the best of our knowledge, the Philter V1.0 pipeline is currently the first and only certified, de-identified redaction pipeline that makes clinical notes available to researchers for nonhuman subjects' research, without further IRB approval needed. To date, we have made over 130 million certified de-identified clinical notes available to over 600 UCSF researchers. These notes were collected over the past 40 years, and represent data from 2757016 UCSF patients

eScholarship - University of California